A fast and simple algorithm for computing the longest common subsequence of run-length encoded strings

نویسندگان

  • Hsing-Yen Ann
  • Chang-Biau Yang
  • Chiou-Ting Tseng
  • Chiou-Yi Hor
چکیده

a r t i c l e i n f o a b s t r a c t Let X and Y be two strings of lengths n and m, respectively, and k and l, respectively, be the numbers of runs in their corresponding run-length encoded forms. We propose a simple algorithm for computing the longest common subsequence of two given strings X and Y in O (kl + min{p 1 , p 2 }) time, where p 1 and p 2 denote the numbers of elements in the bottom and right boundaries of the matched blocks, respectively. It improves the previously known time bound O (min{nl, km}) and outperforms the time bounds O (kl log kl) or O ((k + l + q) log(k + l + q)) for some cases, where q denotes the number of matched blocks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism

Data compression can be used to simultaneously reduce memory, communication and computation requirements of string comparison. In this paper we address the problem of computing the length of the longest common subsequence (LCS) between run-length-encoded (RLE) strings. We exploit RLE both to reduce the complexity of LCS computation from O(M × N) to O(mN + Mn − mn), where M and N are the lengths...

متن کامل

Matching for Run-Length Encoded Strings

1 Motivation Measuring the similarity between two strings, through such standard measures as Hamming distance, edit distance, and longest common subsequence, is one of the fundamental problems in pattern matching. We consider the problem of nding the longest common subsequence of two strings. A well-known dynamic programming algorithm computes the longest common subsequence of strings X and Y i...

متن کامل

Fast Algorithms for Computing the Constrained LCS of Run-Length Encoded Strings

In the constrained longest common subsequence (CLCS) problem, we are given two sequences X , Y and the constrained sequence P in run-length encoded (RLE) format, where |X| = n, |Y | = m and |P | = r and the numbers of runs in RLE format are N , M and R, respectively. In this paper, we show that after the sequences are encoded, the CLCS problem can be solved in O(NMr+ r × min{q1, q2} + q3) time,...

متن کامل

Development of Cache Oblivious Based Fast Multiple Longest Common Subsequence Technique(CMLCS) for Biological Sequences Prediction

A biological sequence is a single, continuous molecule of nucleic acid or protein. Classical methods for the Multiple Longest Common Subsequence problem (MLCS) problem are based on dynamic programming. The Multiple Longest Common Subsequence problem (MLCS) is used to find the longest subsequence shared between two or more strings. For over 30 years, significant efforts have been made to find ef...

متن کامل

Finding a longest common subsequence between a run-length-encoded string and an uncompressed string

In this paper, we propose anO(min{mN,Mn}) time algorithm for finding a longest common subsequence of stringsX and Y with lengthsM andN , respectively, and run-length-encoded lengthsm and n, respectively. We propose a new recursive formula for finding a longest common subsequence of Y and X which is in the run-length-encoded format. That is, Y=y1y2 · · · yN andX=r1 1 r2 2 · · · rm m , where ri i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Process. Lett.

دوره 108  شماره 

صفحات  -

تاریخ انتشار 2008